Hybrid Syntactic Category Induction

نویسنده

  • Bryan Jurish
چکیده

Much research has been devoted to the task of learning lexical classes from unannotated input text. Among the chief difficulties facing any approach to the unsupervised induction of lexical classes are that of token-level ambiguity and the classification of rare and unknown words. Following the work of previous authors, the initial stage of syntactic category induction is treated in the current approach as a clustering problem over a small number of highly frequent word types. An iterative procedure making use of Zipf’s law to generate the clustering schedule classifies less frequent words based on the monotonic Bernoulli entropy of expected co-occurrence probability with respect to the clusters output by the previous stage, employing a fuzzy cluster membership heuristic to approximate type-level ambiguity and reduce error propagation in a simulated melting procedure. In a second processing phase, cluster membership probabilities output by the final clustering stage are used in a procedure for the recovery of context-dependent token-level ambiguity resolution. The induced classifications are evaluated with a meta-modelling strategy intended to capture their expected linguistic utility.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language play facilitates language learning: Optimizing the input for gender-like category induction

Gender induction has been claimed to be virtually impossible unless nouns provide reliable semantic or phonological gender-relevant cues. However, learners might exploit syntactic cues, such as definite articles, to infer the gender of gender-unmarked nouns. In children's poems and songs, such syntactic cues are presented in a highly structured fashion. We assessed gender-like category inductio...

متن کامل

Complexity of Grammar Induction for Quantum Types

Most categorical models of meaning use a functor from the syntactic category to the semantic category. When semantic information is available, the problem of grammar induction can therefore be defined as finding preimages of the semantic types under this forgetful functor, lifting the information flow from the semantic level to a valid reduction at the syntactic level. We study the complexity o...

متن کامل

Cross-Lingual Induction for Deep Broad-Coverage Syntax: A Case Study on German Participles

This paper is a case study on cross-lingual induction of lexical resources for deep, broad-coverage syntactic analysis of German. We use a parallel corpus to induce a classifier for German participles which can predict their syntactic category. By means of this classifier, we induce a resource of adverbial participles from a huge monolingual corpus of German. We integrate the resource into a Ge...

متن کامل

Part of Speech Induction using Non-negative Matrix Factorization

Unsupervised part-of-speech induction involves the discovery of syntactic categories in a text, given no additional information other than the text itself. One requirement of an induction system is the ability to handle multiple categories for each word, in order to deal with word sense ambiguity. We construct an algorithm for unsupervised part-of-speech induction, treating the problem as one o...

متن کامل

Syntactic Pattern Recognition from Observations: A Hybrid Technique

This paper presents a novel technique for automated learning from observations. The technique arranges in a row four traditional pattern recognition approaches (numeric, logic, statistical and finally syntactic) within a unifying framework. Each processing step is conceived as a transformation of the input dataset from one state to another. The proposed technique considers measurable observatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005